nystrom method
We would like to thank the reviewers for their detailed and insightful comments
We would like to thank the reviewers for their detailed and insightful comments. We have resolved the concerns with SGD. We have similar results with Adam. Gaussian sketch performs better than SVRG on problem (1), and is robust to the choice of hyperparameters. We will include these results in the final version.
Review for NeurIPS paper: Fourier Sparse Leverage Scores and Approximate Kernel Learning
Summary and Contributions: This submission concerns kernel-based learning methods, which have been well studied in machine learning. Methods such as Random Fourier Features and the Nystrom method have been used to scale up kernel methods by computing kernel approximations. Such methods generally fall into two categories, namely, oblivious methods (e.g., RFF, Tensor Sketch variant) and data-adaptive methods (Nystrom method). While the former have the advantage of parallelizability (in settings where data is distributed across different machines), the latter have generally given better kernel approximations until now. This work closes this gap between the two classes of methods by advancing an oblivious sketching method with better approimation, based on statistical leverage scores.
Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nystrom method
The Column Subset Selection Problem (CSSP) and the Nystrom method are among the leading tools for constructing small low-rank approximations of large datasets in machine learning and scientific computing. A fundamental question in this area is: how well can a data subset of size k compete with the best rank k approximation? We develop techniques which exploit spectral properties of the data matrix to obtain improved approximation guarantees which go beyond the standard worst-case analysis. Our approach leads to significantly better bounds for datasets with known rates of singular value decay, e.g., polynomial or exponential decay. Our analysis also reveals an intriguing phenomenon: the approximation factor as a function of k may exhibit multiple peaks and valleys, which we call a multiple-descent curve.
Reviews: Recursive Sampling for the Nystrom Method
The authors provide an algorithm which learns a provably accurate low-rank approximation to a PSD matrix in sublinear time. Specifically, it learns an approximation that has low additive error with high probability by sampling columns from the matrix according to a certain importance measure, then forming a Nystrom approximation using these kernels. The importance measure used is an estimate of the ridge leverage scores, which are expensive to compute exactly ( O(n 3)). Their algorithm recursively estimates these leverage score by starting from a set of columns, using those to estimate the leverage scores, sampling a set of columns according to those probabilities, and repeating ... the authors show that when this process is done carefully, the leverage score estimates are accurate enough that they can be used to get almost as good an approximation as using the true ridge leverage scores. The cost of producing the final approximation is O(ns 2) computation time and O(ns) computations of entries in the PSD matrix.
Learning with Neural Tangent Kernels in Near Input Sparsity Time
The Neural Tangent Kernel (NTK) characterizes the behavior of infinitely wide neural nets trained under least squares loss by gradient descent (Jacot et al., 2018). However, despite its importance, the super-quadratic runtime of kernel methods limits the use of NTK in large-scale learning tasks. To accelerate kernel machines with NTK, we propose a near input sparsity time algorithm that maps the input data to a randomized low-dimensional feature space so that the inner product of the transformed data approximates their NTK evaluation. Furthermore, we propose a feature map for approximating the convolutional counterpart of the NTK (Arora et al., 2019), which can transform any image using a runtime that is only linear in the number of pixels. We show that in standard large-scale regression and classification tasks a linear regressor trained on our features outperforms trained NNs and Nystrom method with NTK kernels.
Efficient Data-Driven Geologic Feature Detection from Pre-stack Seismic Measurements using Randomized Machine-Learning Algorithm
Lin, Youzuo, Wang, Shusen, Thiagarajan, Jayaraman, Guthrie, George, Coblentz, David
Conventional seismic techniques for detecting the subsurface geologic features are challenged by limited data coverage, computational inefficiency, and subjective human factors. We developed a novel data-driven geological feature detection approach based on pre-stack seismic measurements. Our detection method employs an efficient and accurate machine-learning detection approach to extract useful subsurface geologic features automatically. Specifically, our method is based on kernel ridge regression model. The conventional kernel ridge regression can be computationally prohibited because of the large volume of seismic measurements. We employ a data reduction technique in combination with the conventional kernel ridge regression method to improve the computational efficiency and reduce memory usage. In particular, we utilize a randomized numerical linear algebra technique, named Nystr\"om method, to effectively reduce the dimensionality of the feature space without compromising the information content required for accurate detection. We provide thorough computational cost analysis to show efficiency of our new geological feature detection methods. We further validate the performance of our new subsurface geologic feature detection method using synthetic surface seismic data for 2D acoustic and elastic velocity models. Our numerical examples demonstrate that our new detection method significantly improves the computational efficiency while maintaining comparable accuracy. Interestingly, we show that our method yields a speed-up ratio on the order of $\sim10^2$ to $\sim 10^3$ in a multi-core computational environment.
An Improved Bound for the Nystrom Method for Large Eigengap
Mahdavi, Mehrdad, Yang, Tianbao, Jin, Rong
We develop an improved bound for the approximation error of the Nystr\"{o}m method under the assumption that there is a large eigengap in the spectrum of kernel matrix. This is based on the empirical observation that the eigengap has a significant impact on the approximation error of the Nystr\"{o}m method. Our approach is based on the concentration inequality of integral operator and the theory of matrix perturbation. Our analysis shows that when there is a large eigengap, we can improve the approximation error of the Nystr\"{o}m method from $O(N/m^{1/4})$ to $O(N/m^{1/2})$ when measured in Frobenius norm, where $N$ is the size of the kernel matrix, and $m$ is the number of sampled columns.
- Asia > Middle East > Jordan (0.05)
- North America > United States > Michigan (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)